Overview

Brought to you by YData

Dataset statistics

Number of variables12
Number of observations5000
Missing cells8461
Missing cells (%)14.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory468.9 KiB
Average record size in memory96.0 B

Variable types

Numeric4
Text6
Categorical2

Alerts

customer_id is highly overall correlated with gender and 1 other fieldsHigh correlation
gender is highly overall correlated with customer_id and 1 other fieldsHigh correlation
marital_status is highly overall correlated with customer_id and 1 other fieldsHigh correlation
tax_id is highly overall correlated with gender and 1 other fieldsHigh correlation
nin has 2062 (41.2%) missing values Missing
passport has 2481 (49.6%) missing values Missing
drivers_license has 999 (20.0%) missing values Missing
voters_card has 1001 (20.0%) missing values Missing
tax_id has 952 (19.0%) missing values Missing
cac_number has 966 (19.3%) missing values Missing
customer_id has unique values Unique

Reproduction

Analysis started2025-01-16 11:36:40.643159
Analysis finished2025-01-16 11:54:12.084108
Duration17 minutes and 31.44 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

customer_id
Real number (ℝ)

High correlation  Unique 

Distinct5000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.486769 × 1010
Minimum1.0027741 × 1010
Maximum9.9997026 × 1010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2025-01-16T12:54:12.696553image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1.0027741 × 1010
5-th percentile1.4736439 × 1010
Q13.2364211 × 1010
median5.4868531 × 1010
Q37.716611 × 1010
95-th percentile9.5623328 × 1010
Maximum9.9997026 × 1010
Range8.9969285 × 1010
Interquartile range (IQR)4.4801899 × 1010

Descriptive statistics

Standard deviation2.5988812 × 1010
Coefficient of variation (CV)0.47366332
Kurtosis-1.1957044
Mean5.486769 × 1010
Median Absolute Deviation (MAD)2.2419024 × 1010
Skewness0.0022546826
Sum2.7433845 × 1014
Variance6.7541834 × 1020
MonotonicityNot monotonic
2025-01-16T12:54:13.280139image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.247409821 × 10101
 
< 0.1%
8.972423399 × 10101
 
< 0.1%
7.129482598 × 10101
 
< 0.1%
9.984909754 × 10101
 
< 0.1%
3.279537511 × 10101
 
< 0.1%
9.6176223 × 10101
 
< 0.1%
7.736425226 × 10101
 
< 0.1%
9.573946874 × 10101
 
< 0.1%
1.302285831 × 10101
 
< 0.1%
9.833272018 × 10101
 
< 0.1%
Other values (4990) 4990
99.8%
ValueCountFrequency (%)
1.002774079 × 10101
< 0.1%
1.003222352 × 10101
< 0.1%
1.003661683 × 10101
< 0.1%
1.005587181 × 10101
< 0.1%
1.006153475 × 10101
< 0.1%
1.007176375 × 10101
< 0.1%
1.00747918 × 10101
< 0.1%
1.010684038 × 10101
< 0.1%
1.018308557 × 10101
< 0.1%
1.024259789 × 10101
< 0.1%
ValueCountFrequency (%)
9.999702598 × 10101
< 0.1%
9.998872 × 10101
< 0.1%
9.997194328 × 10101
< 0.1%
9.992913142 × 10101
< 0.1%
9.991784745 × 10101
< 0.1%
9.990584014 × 10101
< 0.1%
9.988322142 × 10101
< 0.1%
9.988153451 × 10101
< 0.1%
9.987927393 × 10101
< 0.1%
9.987727271 × 10101
< 0.1%
Distinct638
Distinct (%)12.8%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
2025-01-16T12:54:14.357939image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length59
Median length40
Mean length20.6998
Min length3

Characters and Unicode

Total characters103499
Distinct characters54
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowEquality and diversity officer
2nd rowOccupational therapist
3rd rowPhysiotherapist
4th rowPsychologist, occupational
5th rowHydrologist
ValueCountFrequency (%)
officer 439
 
3.8%
engineer 421
 
3.7%
manager 417
 
3.6%
scientist 212
 
1.9%
designer 204
 
1.8%
surveyor 169
 
1.5%
and 157
 
1.4%
education 130
 
1.1%
therapist 121
 
1.1%
teacher 115
 
1.0%
Other values (524) 9071
79.2%
2025-01-16T12:54:17.319655image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 10381
 
10.0%
i 9136
 
8.8%
r 8719
 
8.4%
a 7761
 
7.5%
t 7255
 
7.0%
n 7084
 
6.8%
6456
 
6.2%
o 5991
 
5.8%
s 5576
 
5.4%
c 5027
 
4.9%
Other values (44) 30113
29.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 103499
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 10381
 
10.0%
i 9136
 
8.8%
r 8719
 
8.4%
a 7761
 
7.5%
t 7255
 
7.0%
n 7084
 
6.8%
6456
 
6.2%
o 5991
 
5.8%
s 5576
 
5.4%
c 5027
 
4.9%
Other values (44) 30113
29.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 103499
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 10381
 
10.0%
i 9136
 
8.8%
r 8719
 
8.4%
a 7761
 
7.5%
t 7255
 
7.0%
n 7084
 
6.8%
6456
 
6.2%
o 5991
 
5.8%
s 5576
 
5.4%
c 5027
 
4.9%
Other values (44) 30113
29.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 103499
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 10381
 
10.0%
i 9136
 
8.8%
r 8719
 
8.4%
a 7761
 
7.5%
t 7255
 
7.0%
n 7084
 
6.8%
6456
 
6.2%
o 5991
 
5.8%
s 5576
 
5.4%
c 5027
 
4.9%
Other values (44) 30113
29.1%

nin
Real number (ℝ)

Missing 

Distinct2938
Distinct (%)100.0%
Missing2062
Missing (%)41.2%
Infinite0
Infinite (%)0.0%
Mean5.3994607 × 1010
Minimum1.003104 × 1010
Maximum9.9987051 × 1010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2025-01-16T12:54:17.852757image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1.003104 × 1010
5-th percentile1.3980859 × 1010
Q13.1371027 × 1010
median5.3718349 × 1010
Q37.6632174 × 1010
95-th percentile9.5309973 × 1010
Maximum9.9987051 × 1010
Range8.9956011 × 1010
Interquartile range (IQR)4.5261147 × 1010

Descriptive statistics

Standard deviation2.608408 × 1010
Coefficient of variation (CV)0.48308676
Kurtosis-1.2139339
Mean5.3994607 × 1010
Median Absolute Deviation (MAD)2.2604135 × 1010
Skewness0.040599515
Sum1.5863616 × 1014
Variance6.803792 × 1020
MonotonicityNot monotonic
2025-01-16T12:54:18.400851image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.383041664 × 10101
 
< 0.1%
9.21698988 × 10101
 
< 0.1%
6.914023413 × 10101
 
< 0.1%
8.754477936 × 10101
 
< 0.1%
6.542377583 × 10101
 
< 0.1%
5.539080371 × 10101
 
< 0.1%
3.783526885 × 10101
 
< 0.1%
8.312932412 × 10101
 
< 0.1%
1.277674156 × 10101
 
< 0.1%
7.972256754 × 10101
 
< 0.1%
Other values (2928) 2928
58.6%
(Missing) 2062
41.2%
ValueCountFrequency (%)
1.003104043 × 10101
< 0.1%
1.00372131 × 10101
< 0.1%
1.006946883 × 10101
< 0.1%
1.008788974 × 10101
< 0.1%
1.009964581 × 10101
< 0.1%
1.010775923 × 10101
< 0.1%
1.010875228 × 10101
< 0.1%
1.015564269 × 10101
< 0.1%
1.019120062 × 10101
< 0.1%
1.025798237 × 10101
< 0.1%
ValueCountFrequency (%)
9.998705094 × 10101
< 0.1%
9.9942715 × 10101
< 0.1%
9.990515697 × 10101
< 0.1%
9.98944566 × 10101
< 0.1%
9.978981024 × 10101
< 0.1%
9.976167637 × 10101
< 0.1%
9.973182168 × 10101
< 0.1%
9.971547993 × 10101
< 0.1%
9.949619705 × 10101
< 0.1%
9.946956435 × 10101
< 0.1%

passport
Text

Missing 

Distinct2519
Distinct (%)100.0%
Missing2481
Missing (%)49.6%
Memory size39.2 KiB
2025-01-16T12:54:19.309703image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length9
Median length9
Mean length8.5204446
Min length8

Characters and Unicode

Total characters21463
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2519 ?
Unique (%)100.0%

Sample

1st rowXO6610220
2nd rowMK8799712
3rd rowUZ2368406
4th rowF0385929
5th rowV2133231
ValueCountFrequency (%)
wm3915519 1
 
< 0.1%
be2758840 1
 
< 0.1%
qe8551565 1
 
< 0.1%
hi3522339 1
 
< 0.1%
f5805501 1
 
< 0.1%
i7403791 1
 
< 0.1%
bg0790627 1
 
< 0.1%
m3815892 1
 
< 0.1%
w8863959 1
 
< 0.1%
os8011310 1
 
< 0.1%
Other values (2509) 2509
99.6%
2025-01-16T12:54:20.589036image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8 1866
8.7%
4 1836
8.6%
9 1803
8.4%
5 1775
8.3%
1 1773
8.3%
3 1766
8.2%
7 1761
8.2%
2 1697
7.9%
6 1687
7.9%
0 1669
7.8%
Other values (26) 3830
17.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 21463
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
8 1866
8.7%
4 1836
8.6%
9 1803
8.4%
5 1775
8.3%
1 1773
8.3%
3 1766
8.2%
7 1761
8.2%
2 1697
7.9%
6 1687
7.9%
0 1669
7.8%
Other values (26) 3830
17.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 21463
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
8 1866
8.7%
4 1836
8.6%
9 1803
8.4%
5 1775
8.3%
1 1773
8.3%
3 1766
8.2%
7 1761
8.2%
2 1697
7.9%
6 1687
7.9%
0 1669
7.8%
Other values (26) 3830
17.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 21463
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
8 1866
8.7%
4 1836
8.6%
9 1803
8.4%
5 1775
8.3%
1 1773
8.3%
3 1766
8.2%
7 1761
8.2%
2 1697
7.9%
6 1687
7.9%
0 1669
7.8%
Other values (26) 3830
17.8%
Distinct243
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
2025-01-16T12:54:21.261504image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length51
Median length33
Mean length10.7664
Min length4

Characters and Unicode

Total characters53832
Distinct characters59
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMalaysia
2nd rowAmerican Samoa
3rd rowCook Islands
4th rowMauritius
5th rowZimbabwe
ValueCountFrequency (%)
islands 357
 
4.6%
and 243
 
3.1%
saint 158
 
2.0%
republic 139
 
1.8%
united 105
 
1.3%
island 95
 
1.2%
south 78
 
1.0%
french 72
 
0.9%
states 63
 
0.8%
the 63
 
0.8%
Other values (298) 6407
82.4%
2025-01-16T12:54:22.415496image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 7364
 
13.7%
n 4454
 
8.3%
i 4300
 
8.0%
e 3713
 
6.9%
r 3088
 
5.7%
2780
 
5.2%
o 2664
 
4.9%
t 2281
 
4.2%
l 2261
 
4.2%
s 2209
 
4.1%
Other values (49) 18718
34.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 53832
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 7364
 
13.7%
n 4454
 
8.3%
i 4300
 
8.0%
e 3713
 
6.9%
r 3088
 
5.7%
2780
 
5.2%
o 2664
 
4.9%
t 2281
 
4.2%
l 2261
 
4.2%
s 2209
 
4.1%
Other values (49) 18718
34.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 53832
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 7364
 
13.7%
n 4454
 
8.3%
i 4300
 
8.0%
e 3713
 
6.9%
r 3088
 
5.7%
2780
 
5.2%
o 2664
 
4.9%
t 2281
 
4.2%
l 2261
 
4.2%
s 2209
 
4.1%
Other values (49) 18718
34.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 53832
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 7364
 
13.7%
n 4454
 
8.3%
i 4300
 
8.0%
e 3713
 
6.9%
r 3088
 
5.7%
2780
 
5.2%
o 2664
 
4.9%
t 2281
 
4.2%
l 2261
 
4.2%
s 2209
 
4.1%
Other values (49) 18718
34.8%

marital_status
Categorical

High correlation 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
Single
1718 
Married
1698 
Widowed
1584 

Length

Max length7
Median length7
Mean length6.6564
Min length6

Characters and Unicode

Total characters33282
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSingle
2nd rowWidowed
3rd rowMarried
4th rowMarried
5th rowSingle

Common Values

ValueCountFrequency (%)
Single 1718
34.4%
Married 1698
34.0%
Widowed 1584
31.7%

Length

2025-01-16T12:54:22.947013image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-16T12:54:23.352092image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
single 1718
34.4%
married 1698
34.0%
widowed 1584
31.7%

Most occurring characters

ValueCountFrequency (%)
i 5000
15.0%
e 5000
15.0%
d 4866
14.6%
r 3396
10.2%
S 1718
 
5.2%
l 1718
 
5.2%
g 1718
 
5.2%
n 1718
 
5.2%
M 1698
 
5.1%
a 1698
 
5.1%
Other values (3) 4752
14.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 33282
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 5000
15.0%
e 5000
15.0%
d 4866
14.6%
r 3396
10.2%
S 1718
 
5.2%
l 1718
 
5.2%
g 1718
 
5.2%
n 1718
 
5.2%
M 1698
 
5.1%
a 1698
 
5.1%
Other values (3) 4752
14.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 33282
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 5000
15.0%
e 5000
15.0%
d 4866
14.6%
r 3396
10.2%
S 1718
 
5.2%
l 1718
 
5.2%
g 1718
 
5.2%
n 1718
 
5.2%
M 1698
 
5.1%
a 1698
 
5.1%
Other values (3) 4752
14.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 33282
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 5000
15.0%
e 5000
15.0%
d 4866
14.6%
r 3396
10.2%
S 1718
 
5.2%
l 1718
 
5.2%
g 1718
 
5.2%
n 1718
 
5.2%
M 1698
 
5.1%
a 1698
 
5.1%
Other values (3) 4752
14.3%

drivers_license
Text

Missing 

Distinct4001
Distinct (%)100.0%
Missing999
Missing (%)20.0%
Memory size39.2 KiB
2025-01-16T12:54:23.832685image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters68017
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4001 ?
Unique (%)100.0%

Sample

1st rowU-ILUAQ44-1396390
2nd rowL-PRCHX61-1885792
3rd rowO-XOJTJ30-0791028
4th rowP-ZEZIS45-1063384
5th rowX-LNUWH93-5717866
ValueCountFrequency (%)
p-rafgd72-9104520 1
 
< 0.1%
v-tfefc51-9856888 1
 
< 0.1%
e-yggvj59-3936940 1
 
< 0.1%
y-ujkrp79-2105588 1
 
< 0.1%
f-nnnuz17-1389761 1
 
< 0.1%
s-ilqju98-9461441 1
 
< 0.1%
b-ggdto43-1463479 1
 
< 0.1%
q-occaa62-7022888 1
 
< 0.1%
j-trytm56-9182793 1
 
< 0.1%
l-acknv01-8573658 1
 
< 0.1%
Other values (3991) 3991
99.8%
2025-01-16T12:54:25.810643image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 8002
 
11.8%
1 3690
 
5.4%
9 3649
 
5.4%
3 3646
 
5.4%
7 3601
 
5.3%
5 3598
 
5.3%
8 3585
 
5.3%
2 3573
 
5.3%
0 3561
 
5.2%
4 3556
 
5.2%
Other values (27) 27556
40.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 68017
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 8002
 
11.8%
1 3690
 
5.4%
9 3649
 
5.4%
3 3646
 
5.4%
7 3601
 
5.3%
5 3598
 
5.3%
8 3585
 
5.3%
2 3573
 
5.3%
0 3561
 
5.2%
4 3556
 
5.2%
Other values (27) 27556
40.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 68017
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 8002
 
11.8%
1 3690
 
5.4%
9 3649
 
5.4%
3 3646
 
5.4%
7 3601
 
5.3%
5 3598
 
5.3%
8 3585
 
5.3%
2 3573
 
5.3%
0 3561
 
5.2%
4 3556
 
5.2%
Other values (27) 27556
40.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 68017
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 8002
 
11.8%
1 3690
 
5.4%
9 3649
 
5.4%
3 3646
 
5.4%
7 3601
 
5.3%
5 3598
 
5.3%
8 3585
 
5.3%
2 3573
 
5.3%
0 3561
 
5.2%
4 3556
 
5.2%
Other values (27) 27556
40.5%

voters_card
Text

Missing 

Distinct3999
Distinct (%)100.0%
Missing1001
Missing (%)20.0%
Memory size39.2 KiB
2025-01-16T12:54:26.392652image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length16
Median length16
Mean length16
Min length16

Characters and Unicode

Total characters63984
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3999 ?
Unique (%)100.0%

Sample

1st rowXPY/186/07/20045
2nd rowMSA/775/26/44500
3rd rowBKQ/568/35/91728
4th rowLIA/225/36/59688
5th rowVIC/056/69/83740
ValueCountFrequency (%)
mbr/062/12/29701 1
 
< 0.1%
pue/451/15/31868 1
 
< 0.1%
htg/095/02/17329 1
 
< 0.1%
fki/123/32/16265 1
 
< 0.1%
cha/802/00/59019 1
 
< 0.1%
dtb/263/46/30656 1
 
< 0.1%
oay/198/97/37923 1
 
< 0.1%
rke/290/24/12801 1
 
< 0.1%
bpn/837/80/85131 1
 
< 0.1%
cdo/939/29/41441 1
 
< 0.1%
Other values (3989) 3989
99.7%
2025-01-16T12:54:27.509309image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 11997
18.8%
6 4164
 
6.5%
7 4069
 
6.4%
0 4056
 
6.3%
2 4027
 
6.3%
8 3982
 
6.2%
9 3980
 
6.2%
3 3970
 
6.2%
5 3940
 
6.2%
4 3916
 
6.1%
Other values (27) 15883
24.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 63984
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
/ 11997
18.8%
6 4164
 
6.5%
7 4069
 
6.4%
0 4056
 
6.3%
2 4027
 
6.3%
8 3982
 
6.2%
9 3980
 
6.2%
3 3970
 
6.2%
5 3940
 
6.2%
4 3916
 
6.1%
Other values (27) 15883
24.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 63984
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
/ 11997
18.8%
6 4164
 
6.5%
7 4069
 
6.4%
0 4056
 
6.3%
2 4027
 
6.3%
8 3982
 
6.2%
9 3980
 
6.2%
3 3970
 
6.2%
5 3940
 
6.2%
4 3916
 
6.1%
Other values (27) 15883
24.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 63984
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
/ 11997
18.8%
6 4164
 
6.5%
7 4069
 
6.4%
0 4056
 
6.3%
2 4027
 
6.3%
8 3982
 
6.2%
9 3980
 
6.2%
3 3970
 
6.2%
5 3940
 
6.2%
4 3916
 
6.1%
Other values (27) 15883
24.8%

tax_id
Real number (ℝ)

High correlation  Missing 

Distinct4048
Distinct (%)100.0%
Missing952
Missing (%)19.0%
Infinite0
Infinite (%)0.0%
Mean5.0033818 × 109
Minimum2116573
Maximum9.9951354 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2025-01-16T12:54:28.039595image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum2116573
5-th percentile4.8036053 × 108
Q12.5513923 × 109
median4.9964317 × 109
Q37.4661577 × 109
95-th percentile9.5221473 × 109
Maximum9.9951354 × 109
Range9.9930188 × 109
Interquartile range (IQR)4.9147655 × 109

Descriptive statistics

Standard deviation2.8622933 × 109
Coefficient of variation (CV)0.57207174
Kurtosis-1.1621938
Mean5.0033818 × 109
Median Absolute Deviation (MAD)2.4556884 × 109
Skewness0.0005591873
Sum2.025369 × 1013
Variance8.1927232 × 1018
MonotonicityNot monotonic
2025-01-16T12:54:28.614410image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8988517976 1
 
< 0.1%
6088010554 1
 
< 0.1%
6862142259 1
 
< 0.1%
3648292346 1
 
< 0.1%
1681918254 1
 
< 0.1%
5778751994 1
 
< 0.1%
9514932667 1
 
< 0.1%
4892378526 1
 
< 0.1%
7304356593 1
 
< 0.1%
8518911432 1
 
< 0.1%
Other values (4038) 4038
80.8%
(Missing) 952
 
19.0%
ValueCountFrequency (%)
2116573 1
< 0.1%
4180345 1
< 0.1%
7472042 1
< 0.1%
9873508 1
< 0.1%
11642587 1
< 0.1%
12567432 1
< 0.1%
17285827 1
< 0.1%
26098980 1
< 0.1%
27812918 1
< 0.1%
36794721 1
< 0.1%
ValueCountFrequency (%)
9995135373 1
< 0.1%
9994960899 1
< 0.1%
9994043807 1
< 0.1%
9991878335 1
< 0.1%
9987801168 1
< 0.1%
9986881900 1
< 0.1%
9983897102 1
< 0.1%
9979310343 1
< 0.1%
9976985254 1
< 0.1%
9975649558 1
< 0.1%

cac_number
Text

Missing 

Distinct4034
Distinct (%)100.0%
Missing966
Missing (%)19.3%
Memory size39.2 KiB
2025-01-16T12:54:29.225644image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters40340
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4034 ?
Unique (%)100.0%

Sample

1st rowBN/6363282
2nd rowBN/7765086
3rd rowRC/7145711
4th rowRC/7292868
5th rowBN/8508999
ValueCountFrequency (%)
rc/1534651 1
 
< 0.1%
rc/8422783 1
 
< 0.1%
bn/2308183 1
 
< 0.1%
rc/0626422 1
 
< 0.1%
rc/2419496 1
 
< 0.1%
bn/4813609 1
 
< 0.1%
rc/3275050 1
 
< 0.1%
rc/2338970 1
 
< 0.1%
bn/5503072 1
 
< 0.1%
bn/4344227 1
 
< 0.1%
Other values (4024) 4024
99.8%
2025-01-16T12:54:30.180822image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 4034
 
10.0%
7 2905
 
7.2%
5 2884
 
7.1%
1 2869
 
7.1%
2 2869
 
7.1%
0 2803
 
6.9%
6 2802
 
6.9%
4 2798
 
6.9%
9 2778
 
6.9%
8 2768
 
6.9%
Other values (5) 10830
26.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 40340
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
/ 4034
 
10.0%
7 2905
 
7.2%
5 2884
 
7.1%
1 2869
 
7.1%
2 2869
 
7.1%
0 2803
 
6.9%
6 2802
 
6.9%
4 2798
 
6.9%
9 2778
 
6.9%
8 2768
 
6.9%
Other values (5) 10830
26.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 40340
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
/ 4034
 
10.0%
7 2905
 
7.2%
5 2884
 
7.1%
1 2869
 
7.1%
2 2869
 
7.1%
0 2803
 
6.9%
6 2802
 
6.9%
4 2798
 
6.9%
9 2778
 
6.9%
8 2768
 
6.9%
Other values (5) 10830
26.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 40340
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
/ 4034
 
10.0%
7 2905
 
7.2%
5 2884
 
7.1%
1 2869
 
7.1%
2 2869
 
7.1%
0 2803
 
6.9%
6 2802
 
6.9%
4 2798
 
6.9%
9 2778
 
6.9%
8 2768
 
6.9%
Other values (5) 10830
26.8%

gender
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
M
2501 
F
2499 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters5000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowF
4th rowF
5th rowF

Common Values

ValueCountFrequency (%)
M 2501
50.0%
F 2499
50.0%

Length

2025-01-16T12:54:30.644488image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-16T12:54:30.950216image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
m 2501
50.0%
f 2499
50.0%

Most occurring characters

ValueCountFrequency (%)
M 2501
50.0%
F 2499
50.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 2501
50.0%
F 2499
50.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 2501
50.0%
F 2499
50.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 2501
50.0%
F 2499
50.0%

postal_code
Real number (ℝ)

Distinct4873
Distinct (%)97.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50756.144
Minimum522
Maximum99892
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2025-01-16T12:54:31.410886image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum522
5-th percentile5288.2
Q125633
median50960
Q375853.5
95-th percentile94850.25
Maximum99892
Range99370
Interquartile range (IQR)50220.5

Descriptive statistics

Standard deviation28914.729
Coefficient of variation (CV)0.56967939
Kurtosis-1.2134979
Mean50756.144
Median Absolute Deviation (MAD)25096
Skewness-0.028345452
Sum2.5378072 × 108
Variance8.3606157 × 108
MonotonicityNot monotonic
2025-01-16T12:54:32.022654image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
73082 3
 
0.1%
55165 3
 
0.1%
85830 3
 
0.1%
92896 3
 
0.1%
62782 2
 
< 0.1%
46707 2
 
< 0.1%
66802 2
 
< 0.1%
10029 2
 
< 0.1%
70993 2
 
< 0.1%
73928 2
 
< 0.1%
Other values (4863) 4976
99.5%
ValueCountFrequency (%)
522 1
< 0.1%
562 1
< 0.1%
563 1
< 0.1%
565 1
< 0.1%
604 2
< 0.1%
615 1
< 0.1%
629 1
< 0.1%
648 1
< 0.1%
661 1
< 0.1%
677 1
< 0.1%
ValueCountFrequency (%)
99892 1
< 0.1%
99885 1
< 0.1%
99871 1
< 0.1%
99856 1
< 0.1%
99852 1
< 0.1%
99820 1
< 0.1%
99796 1
< 0.1%
99697 1
< 0.1%
99692 1
< 0.1%
99681 1
< 0.1%

Interactions

2025-01-16T12:48:25.020369image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:36:45.086969image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:41:49.023797image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:43:48.280740image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:49:48.671683image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:37:31.231623image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:42:04.045947image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:44:49.023436image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:50:12.319666image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:37:46.568244image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:42:04.325712image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:45:12.649014image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:51:47.747817image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:39:20.013836image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:42:53.509954image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-16T12:46:54.934628image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-01-16T12:54:32.235312image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
customer_idgendermarital_statusninpostal_codetax_id
customer_id1.0001.0001.000-0.006-0.0080.018
gender1.0001.0000.0000.0000.0251.000
marital_status1.0000.0001.0000.0300.0001.000
nin-0.0060.0000.0301.000-0.012-0.008
postal_code-0.0080.0250.000-0.0121.000-0.019
tax_id0.0181.0001.000-0.008-0.0191.000

Missing values

2025-01-16T12:54:08.533839image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-01-16T12:54:09.460079image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-01-16T12:54:11.726222image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

customer_idoccupationninpassportcountry_of_birthmarital_statusdrivers_licensevoters_cardtax_idcac_numbergenderpostal_code
089724233993Equality and diversity officer7.250263e+10XO6610220MalaysiaSingleU-ILUAQ44-1396390XPY/186/07/20045NoneBN/6363282M08529
171294825978Occupational therapistNaNMK8799712American SamoaWidowedL-PRCHX61-1885792MSA/775/26/445004701744884BN/7765086F89203
299849097541Physiotherapist9.714721e+10NoneCook IslandsMarriedO-XOJTJ30-0791028BKQ/568/35/917280328856397RC/7145711F08602
332795375109Psychologist, occupational5.301804e+10NoneMauritiusMarriedP-ZEZIS45-1063384LIA/225/36/596885798081552NoneF00661
496176222995Hydrologist1.569574e+10UZ2368406ZimbabweSingleNoneNone8799560272NoneF88503
577364252258Stage managerNaNNoneKoreaSingleX-LNUWH93-5717866VIC/056/69/837408988858393RC/7292868F79695
695739468736Scientist, research (life sciences)7.055348e+10F0385929French PolynesiaWidowedC-FPBDD48-2527049SZI/197/59/72221NoneBN/8508999F55126
742716803431Clinical molecular geneticistNaNNoneSaint Vincent and the GrenadinesMarriedX-EHLCX81-8541173AOW/423/33/469467737781422BN/2430902F32299
815808923190Scientist, water quality6.716493e+10V2133231El SalvadorSingleU-PINPB59-1051089CUT/648/23/546372912970628NoneM69290
924574806522Therapist, artNaNNoneCongoMarriedI-FEMSQ99-9704021PDC/305/83/571580934847368RC/7448566F87656
customer_idoccupationninpassportcountry_of_birthmarital_statusdrivers_licensevoters_cardtax_idcac_numbergenderpostal_code
499075448158133Teacher, special educational needs6.205005e+10ZV7328849BelizeSingleNoneNone7760550675BN/6602019F73395
499175268145704Equality and diversity officerNaNR6441050Saint Vincent and the GrenadinesMarriedK-FAVAY57-1167847IGV/602/21/04594NoneNoneM41207
499298299252818Armed forces technical officerNaNNoneFijiWidowedNoneKWH/316/62/57114NoneNoneM14384
499345262454278Social worker3.970058e+10NoneFranceWidowedNoneMPF/994/60/903102112970651BN/6386428F02554
499412445037534Administrator, sports2.246730e+10NoneMauritaniaSingleNoneURC/321/89/257611663752262BN/9635642M09190
499515860174900Research scientist (maths)9.891724e+10NoneCroatiaSingleV-CQSEK72-7551835PJI/175/56/880764825132141BN/9657759F52966
499631385632341Psychologist, forensic2.766576e+10HH3127708LatviaSingleP-BWCJB76-0337214SBQ/576/29/301392589942052RC/4697396F59597
499769767870710English as a foreign language teacher2.066151e+10NoneSaint Kitts and NevisSingleL-WGRPD35-9964033NEH/531/69/077776534524020RC/6967565F64136
499862468608105Barrister7.742233e+10NoneSaint LuciaWidowedQ-DAXEY00-5486546None7264167475BN/6262542M71869
499942474098214Health visitor3.549547e+10FZ8762915United KingdomMarriedG-TIBMW62-0124660None8913371353RC/9169873M53553